Add stats service by ehinman · Pull Request #207 · DOI-USGS/dataretrieval-python

ehinman · 2025-12-30T17:00:04Z

Adds in two functions that query the two endpoints at: https://api.waterdata.usgs.gov/statistics/v0/docs

Also adds utils functions for parsing and organizing the json response. Water Data API functions could be further edited to include the stats API functions, but for now I kept them separate.

To do (1/8/26):

Add percentile values for min/max/median to match R dataretrieval
Add unit tests
Add examples

dataretrieval/waterdata/api.py

Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>

… add more examples

ehinman · 2026-02-20T22:52:58Z

@jzemmels @jeffskwang-usgs this PR is ready for your review. It includes the two stats API endpoints and should mirror how the functions work in R. I'm prioritizing getting the functions in so that they can be used in the current conditions pipeline, but eventually a vignette on how to use them would be nice, too. That can be a separate PR.

Thanks for your feedback!

jzemmels

Nice work! Here's a summary of my review:

Ran examples in documentation, comparing against what's returned by dR. All looks good
Ran the unit tests locally, everything passed
We've been using the por function extensively already in the current conditions pipeline, which I think is evidence enough that the functions work as-intended.

I think the only substantive differences between these and the dR functions are the naming convention por_stats vs. stats_por and your inclusion of the expand_percentiles argument. The API output can be a bit confusing depending on the exact settings of computation_type. I'm not sure if there's a precedent from other endpoints for dealing with nested data, so mentioning the subtleties somewhere might be helpful.

jzemmels · 2026-02-23T17:32:04Z

dataretrieval/waterdata/api.py

+        measured and the units of measure. A complete list of parameter codes
+        and associated groupings can be found at
+        https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
+    expand_percentiles : boolean


May be helpful to also mention that setting expand_percentiles = False and requesting 'percentiles' and one of ['median', 'minimum', 'maximum', 'arithmetic_mean'] will return a value and values column, whereas expand_percentiles = True will consolidate these columns into a single value column. Requesting just 'percentiles' and expand_percentiles = False will return just a values column. There's probably a simpler way to describe this than how I've said.

Good idea, I have added some information about this.

Looks good! I wasn't saying you should change the function names to match dR, just that they were different.

The read_stats_por and read_stats_daterange naming convention was to make it easier for tab-completion (i.e., someone types read_stats then tab to see the two options appear).

I think it's a good change. The same sort of thing can be applied in python. It's nice to be consistent. Now to deal with this sudden ubuntu failure, ugh.

jeffskwang-usgs · 2026-02-23T20:12:13Z

Hi @ehinman, thanks for including me on this. I've looked over the code, but I'd also like to run the unit tests. I'm unfamiliar with testing python packages, so what's the best way to go about that?

ehinman · 2026-02-23T20:41:36Z

Hi @ehinman, thanks for including me on this. I've looked over the code, but I'd also like to run the unit tests. I'm unfamiliar with testing python packages, so what's the best way to go about that?

Thanks Jeffrey! Let's see, you'll want to make sure you have the branch version dataretrieval-python installed in your environment, plus its dependencies, plus pytest. Then, you should be able to navigate to your terminal, make sure it's in the correct repo, and run pytest and it'll find the "tests" folder and run those tests. It creates a little report-out on the PASS/FAIL status of each test.

jzemmels · 2026-02-23T20:48:01Z

Hi @ehinman, thanks for including me on this. I've looked over the code, but I'd also like to run the unit tests. I'm unfamiliar with testing python packages, so what's the best way to go about that?

I probably should figure out how to use pytest, but I just ran the test examples and manually checked that the assert statements were true.

jeffskwang-usgs · 2026-02-24T14:36:54Z

Ok, I was having a little diffuculty installing things correctly to run pytest. I used pixi to build an environment from the included pyproject.toml. I needed geopandas to run the test, and ran into issues building the environment because

dataretrieval-python % pixi add geopandas
Error:   × failed to solve the pypi requirements of environment 'default' for platform 'osx-arm64'
  ├─▶ failed to resolve pypi dependencies
  ╰─▶ Because you require pandas>=2.0.0,<3.0.0 and pandas==3.0.1, we can conclude that your requirements are unsatisfiable.
  help: The following PyPI packages have been pinned by the conda solve, and this version may be causing a conflict:
        pandas==3.0.1
        See https://pixi.sh/latest/concepts/conda_pypi/#pinned-package-conflicts for more information.

I had to remove the <3.0.0 contraint from pandas in the pyproject.toml file to get it to build. After that I ran the tests. I believe all the newly added tests for the stats service passed.

dataretrieval-python % pytest -vv tests/waterdata_test.py    
========================================================================================================== test session starts ===========================================================================================================
platform darwin -- Python 3.14.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/jkwang/Desktop/data-ret-test/dataretrieval-python/.pixi/envs/default/bin/python3.14
cachedir: .pytest_cache
rootdir: /Users/jkwang/Desktop/data-ret-test/dataretrieval-python
configfile: pyproject.toml
collected 25 items                                                                                                                                                                                                                       

tests/waterdata_test.py::test_mock_get_samples ERROR                                                                                                                                                                               [  4%]
tests/waterdata_test.py::test_check_profiles PASSED                                                                                                                                                                                [  8%]
tests/waterdata_test.py::test_samples_results PASSED                                                                                                                                                                               [ 12%]
tests/waterdata_test.py::test_samples_activity PASSED                                                                                                                                                                              [ 16%]
tests/waterdata_test.py::test_samples_locations PASSED                                                                                                                                                                             [ 20%]
tests/waterdata_test.py::test_samples_projects PASSED                                                                                                                                                                              [ 24%]
tests/waterdata_test.py::test_samples_organizations PASSED                                                                                                                                                                         [ 28%]
tests/waterdata_test.py::test_get_daily PASSED                                                                                                                                                                                     [ 32%]
tests/waterdata_test.py::test_get_daily_properties PASSED                                                                                                                                                                          [ 36%]
tests/waterdata_test.py::test_get_daily_properties_id PASSED                                                                                                                                                                       [ 40%]
tests/waterdata_test.py::test_get_daily_no_geometry PASSED                                                                                                                                                                         [ 44%]
tests/waterdata_test.py::test_get_continuous FAILED                                                                                                                                                                                [ 48%]
tests/waterdata_test.py::test_get_monitoring_locations PASSED                                                                                                                                                                      [ 52%]
tests/waterdata_test.py::test_get_monitoring_locations_hucs PASSED                                                                                                                                                                 [ 56%]
tests/waterdata_test.py::test_get_latest_continuous FAILED                                                                                                                                                                         [ 60%]
tests/waterdata_test.py::test_get_latest_daily PASSED                                                                                                                                                                              [ 64%]
tests/waterdata_test.py::test_get_latest_daily_properties_geometry PASSED                                                                                                                                                          [ 68%]
tests/waterdata_test.py::test_get_field_measurements PASSED                                                                                                                                                                        [ 72%]
tests/waterdata_test.py::test_get_time_series_metadata PASSED                                                                                                                                                                      [ 76%]
tests/waterdata_test.py::test_get_reference_table PASSED                                                                                                                                                                           [ 80%]
tests/waterdata_test.py::test_get_reference_table_with_query PASSED                                                                                                                                                                [ 84%]
tests/waterdata_test.py::test_get_reference_table_wrong_name PASSED                                                                                                                                                                [ 88%]
tests/waterdata_test.py::test_get_por_stats PASSED                                                                                                                                                                                 [ 92%]
tests/waterdata_test.py::test_get_por_stats_expanded_false PASSED                                                                                                                                                                  [ 96%]
tests/waterdata_test.py::test_get_date_range_stats PASSED                                                                                                                                                                          [100%]

================================================================================================================= ERRORS =================================================================================================================
________________________________________________________________________________________________ ERROR at setup of test_mock_get_samples _________________________________________________________________________________________________
file /Users/jkwang/Desktop/data-ret-test/dataretrieval-python/tests/waterdata_test.py, line 31
  def test_mock_get_samples(requests_mock):
E       fixture 'requests_mock' not found
>       available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, capteesys, doctest_namespace, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, subtests, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/jkwang/Desktop/data-ret-test/dataretrieval-python/tests/waterdata_test.py:31
================================================================================================================ FAILURES ================================================================================================================
__________________________________________________________________________________________________________ test_get_continuous ___________________________________________________________________________________________________________

    def test_get_continuous():
        df,_ = get_continuous(
            monitoring_location_id="USGS-06904500",
            parameter_code="00065",
            time="2025-01-01/2025-12-31"
        )
        assert isinstance(df, DataFrame)
        assert "geometry" not in df.columns
        assert df.shape[1] == 11
>       assert df['time'].dtype == 'datetime64[ns, UTC]'
E       AssertionError: assert datetime64[us, UTC] == 'datetime64[ns, UTC]'
E        +  where datetime64[us, UTC] = 0       2025-01-01 00:00:00+00:00\n1       2025-01-01 00:15:00+00:00\n2       2025-01-01 00:30:00+00:00\n3       2025-01-01 00:45:00+00:00\n4       2025-01-01 01:00:00+00:00\n                   ...           \n34525   2025-12-30 23:00:00+00:00\n34526   2025-12-30 23:15:00+00:00\n34527   2025-12-30 23:30:00+00:00\n34528   2025-12-30 23:45:00+00:00\n34529   2025-12-31 00:00:00+00:00\nName: time, Length: 34530, dtype: datetime64[us, UTC].dtype

tests/waterdata_test.py:179: AssertionError
_______________________________________________________________________________________________________ test_get_latest_continuous _______________________________________________________________________________________________________

    def test_get_latest_continuous():
        df, md = get_latest_continuous(
            monitoring_location_id=["USGS-05427718", "USGS-05427719"],
            parameter_code=["00060", "00065"]
        )
        assert "latest_continuous_id" == df.columns[-1]
        assert df.shape[0] <= 4
        assert df.statistic_id.unique().tolist() == ["00011"]
        assert hasattr(md, 'url')
        assert hasattr(md, 'query_time')
>       assert df['time'].dtype == 'datetime64[ns, UTC]'
E       AssertionError: assert datetime64[us, UTC] == 'datetime64[ns, UTC]'
E        +  where datetime64[us, UTC] = 0   2026-02-24 14:00:00+00:00\n1   2026-02-24 14:00:00+00:00\nName: time, dtype: datetime64[us, UTC].dtype

tests/waterdata_test.py:207: AssertionError
============================================================================================================ warnings summary ============================================================================================================
dataretrieval/__init__.py:9
  /Users/jkwang/Desktop/data-ret-test/dataretrieval-python/dataretrieval/__init__.py:9: DeprecationWarning: The 'nwis' services are deprecated and being decommissioned. Please use the 'waterdata' module to access the new services.
    from dataretrieval.nwis import *

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================================== short test summary info =========================================================================================================
FAILED tests/waterdata_test.py::test_get_continuous - AssertionError: assert datetime64[us, UTC] == 'datetime64[ns, UTC]'
 +  where datetime64[us, UTC] = 0       2025-01-01 00:00:00+00:00\n1       2025-01-01 00:15:00+00:00\n2       2025-01-01 00:30:00+00:00\n3       2025-01-01 00:45:00+00:00\n4       2025-01-01 01:00:00+00:00\n                   ...           \n34525   2025-12-30 23:00:00+00:00\n34526   2025-12-30 23:15:00+00:00\n34527   2025-12-30 23:30:00+00:00\n34528   2025-12-30 23:45:00+00:00\n34529   2025-12-31 00:00:00+00:00\nName: time, Length: 34530, dtype: datetime64[us, UTC].dtype
FAILED tests/waterdata_test.py::test_get_latest_continuous - AssertionError: assert datetime64[us, UTC] == 'datetime64[ns, UTC]'
 +  where datetime64[us, UTC] = 0   2026-02-24 14:00:00+00:00\n1   2026-02-24 14:00:00+00:00\nName: time, dtype: datetime64[us, UTC].dtype
ERROR tests/waterdata_test.py::test_mock_get_samples
=========================================================================================== 2 failed, 22 passed, 1 warning, 1 error in 16.10s ============================================================================================

ehinman · 2026-02-24T14:45:53Z

@jeffskwang-usgs, thanks for running these test on your machine! I believe the first error is due to the fact that you do not have all the modules installed to run the tests, namely requests-mock. Is that correct? At any rate, that is not used to test unit tests added in this MR. The other two failures I believe are related to differences in pandas 2.x.x and pandas 3.x.x: the latter uses a slightly different notation for time (ns vs us). Tim and I have discussed bumping up the dependency to include 3.x.x, but decided that would be its own separate MR.

jeffskwang-usgs · 2026-02-24T15:41:50Z

That's right, I was able to get that part to pass after using pixi in install it:

dataretrieval-python % pixi add pytest requests-mock
 WARN The package `pytest-cov==7.0.0` does not have an extra named `all`
✔ Added pytest >=9.0.2,<10
✔ Added requests-mock >=1.12.1,<2

dataretrieval-python % pytest -vv tests/waterdata_test.py                                   
===================================================================================== test session starts ======================================================================================
platform darwin -- Python 3.14.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/jkwang/Desktop/data-ret-test/dataretrieval-python/.pixi/envs/default/bin/python3.14
cachedir: .pytest_cache
rootdir: /Users/jkwang/Desktop/data-ret-test/dataretrieval-python
configfile: pyproject.toml
plugins: requests-mock-1.12.1
collected 25 items                                                                                                                                                                             

tests/waterdata_test.py::test_mock_get_samples PASSED

ehinman added 7 commits December 29, 2025 21:10

initial code for stats service

c067597

modularize stats function, fix paging

e00bb69

fix input types

00d0cf9

I don't think this is necessary here, as non-200's are treated as errors

65dff7f

get rid of warnings.warn

c17a07f

fix non geopandas flattening

f7ee053

break up function into two, add in further unnesting

1865aa3

jzemmels reviewed Jan 9, 2026

View reviewed changes

dataretrieval/waterdata/api.py Outdated Show resolved Hide resolved

jzemmels reviewed Jan 9, 2026

View reviewed changes

dataretrieval/waterdata/api.py Outdated Show resolved Hide resolved

jzemmels closed this Jan 9, 2026

jzemmels reopened this Jan 9, 2026

Apply suggestions from code review

98e3e86

Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>

jzemmels self-requested a review February 5, 2026 18:39

ehinman added 5 commits February 20, 2026 14:51

merge with main, add in percentiles for max, min, median, add example

4c46ead

fix issue with column name not always being present given the inputs,…

709450a

… add more examples

add unit tests, correct issue with init file

7127924

fix testing to take into account geopandas?

a21a464

clean up documentation descriptions a little bit

8f2bc0a

ehinman requested a review from jeffskwang-usgs February 20, 2026 22:50

ehinman marked this pull request as ready for review February 20, 2026 22:51

jzemmels approved these changes Feb 23, 2026

View reviewed changes

ehinman added 2 commits February 24, 2026 09:20

change function names, add more to documentation

193205e

alphabetical order

8ff1ae5

ehinman merged commit 4dc9f6a into DOI-USGS:main Feb 24, 2026
7 of 13 checks passed

Conversation

ehinman commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ehinman commented Feb 20, 2026

Uh oh!

jzemmels left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jzemmels Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

ehinman Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

jzemmels Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ehinman Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

jeffskwang-usgs commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehinman commented Feb 23, 2026

Uh oh!

jzemmels commented Feb 23, 2026

Uh oh!

jeffskwang-usgs commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehinman commented Feb 24, 2026

Uh oh!

jeffskwang-usgs commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ehinman commented Dec 30, 2025 •

edited

Loading

jzemmels left a comment •

edited

Loading

jeffskwang-usgs commented Feb 23, 2026 •

edited

Loading

jeffskwang-usgs commented Feb 24, 2026 •

edited

Loading